Semantic Document Selection - Historical Research on Collections That Span Multiple Centuries
نویسندگان
چکیده
The availability of digitized collections of historical data, such as newspapers, increases every day. With that, so does the wish for historians to explore these collections. Methods that are traditionally used to examine a collection do not scale up to today’s collection sizes. We propose a method that combines text mining with exploratory search to provide historians with a means of interactively selecting and inspecting relevant documents from very large collections. We assess our proposal with a case study on a prototype system.
منابع مشابه
Semantic Document Selection
The availability of digitized collections of historical data, such as newspapers, increases every day. With that, so does the wish for historians to explore these collections. Methods that are traditionally used to examine a collection do not scale up to today’s collection sizes. We propose a method that combines text mining with exploratory search to provide historians with a means of interact...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملFinding Centuries-Old Hyperlinks: a Novel Semi-Supervised Shape Classifier
Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system ...
متن کاملSHAX: The Semantic Historical Archive eXplorer
Newspaper archives are some of the richest historical document collections. Their study is, however, very tedious: one needs to physically visit the archives, search through reams of old, very fragile paper, and manually assemble cross-references. We present Shax, a visual newspaper-archive exploration tool that takes large, historical archives as an input and allows interested parties to brows...
متن کاملConceptual changes of Mihrab, emphasizing on third and fourth century AH sources
Mihrab existed before Islam. This word became one of the main components of Islamic mosques after the Islamic conquest. Structure and function changes of Mihrab during these two periods could be considered in various methods. Considering conceptual changes is one of the historical studies methods. This article aims to investigate a part of conceptual changes that reflects Mihrab’s structural an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012